stochastic case
Best of both worlds: Stochastic & adversarial best-arm identification
Abbasi-Yadkori, Yasin, Bartlett, Peter L., Gabillon, Victor, Malek, Alan, Valko, Michal
We study bandit best-arm identification with arbitrary and potentially adversarial rewards. A simple random uniform learner obtains the optimal rate of error in the adversarial scenario. However, this type of strategy is suboptimal when the rewards are sampled stochastically. Therefore, we ask: Can we design a learner that performs optimally in both the stochastic and adversarial problems while not being aware of the nature of the rewards? First, we show that designing such a learner is impossible in general. In particular, to be robust to adversarial rewards, we can only guarantee optimal rates of error on a subset of the stochastic problems. We give a lower bound that characterizes the optimal rate in stochastic problems if the strategy is constrained to be robust to adversarial rewards. Finally, we design a simple parameter-free algorithm and show that its probability of error matches (up to log factors) the lower bound in stochastic problems, and it is also robust to adversarial ones.
Optimal Algorithms for Decentralized Stochastic Variational Inequalities
Variational inequalities are a formalism that includes games, minimization, saddle point, and equilibrium problems as special cases. Methods for variational inequalities are therefore universal approaches for many applied tasks, including machine learning problems. This work concentrates on the decentralized setting, which is increasingly important but not well understood. In particular, we consider decentralized stochastic (sum-type) variational inequalities over fixed and time-varying networks. We present lower complexity bounds for both communication and local iterations and construct optimal algorithms that match these lower bounds. Our algorithms are the best among the available literature not only in the decentralized stochastic case, but also in the decentralized deterministic and non-distributed stochastic cases. Experimental results confirm the effectiveness of the presented algorithms.
Stochastic Optimal Control via Measure Relaxations
Buehrle, Etienne, Stiller, Christoph
The optimal control problem of stochastic systems is commonly solved via robust [2, 21] or scenario-based [7, 19, 17] optimization methods, which are both challenging to scale to long optimization horizons due to their open-loop nature. Dynamic programming formulations [4], while applicable to stochastic systems, typically involve nonconvex optimization problems and do not support specifying the terminal distribution. Polynomial optimization has been proposed for deterministic nonlinear [11] and hybrid systems [16]. We extend the method to stochastic systems using a weak formulation of the Fokker-Planck equation. As a cost function, we propose to use the Christoffel polynomial, which can be estimated from data.
Optimal Algorithms for Decentralized Stochastic Variational Inequalities
Variational inequalities are a formalism that includes games, minimization, saddle point, and equilibrium problems as special cases. Methods for variational inequalities are therefore universal approaches for many applied tasks, including machine learning problems. This work concentrates on the decentralized setting, which is increasingly important but not well understood. In particular, we consider decentralized stochastic (sum-type) variational inequalities over fixed and time-varying networks. We present lower complexity bounds for both communication and local iterations and construct optimal algorithms that match these lower bounds.
Fair Online Bilateral Trade
Bachoc, François, Cesa-Bianchi, Nicolò, Cesari, Tommaso, Colomboni, Roberto
In online bilateral trade, a platform posts prices to incoming pairs of buyers and sellers that have private valuations for a certain good. If the price is lower than the buyers' valuation and higher than the sellers' valuation, then a trade takes place. Previous work focused on the platform perspective, with the goal of setting prices maximizing the gain from trade (the sum of sellers' and buyers' utilities). Gain from trade is, however, potentially unfair to traders, as they may receive highly uneven shares of the total utility. In this work we enforce fairness by rewarding the platform with the fair gain from trade, defined as the minimum between sellers' and buyers' utilities. After showing that any no-regret learning algorithm designed to maximize the sum of the utilities may fail badly with fair gain from trade, we present our main contribution: a complete characterization of the regret regimes for fair gain from trade when, after each interaction, the platform only learns whether each trader accepted the current price. Specifically, we prove the following regret bounds: $\Theta(\ln T)$ in the deterministic setting, $\Omega(T)$ in the stochastic setting, and $\tilde{\Theta}(T^{2/3})$ in the stochastic setting when sellers' and buyers' valuations are independent of each other. We conclude by providing tight regret bounds when, after each interaction, the platform is allowed to observe the true traders' valuations.
Stochastic mean-shift clustering
It estimates the probability density function of a random variable Fukunaga & Hostetler (1975). The clustering algorithm is applied to a variety of areas, like segmentation images, Tao et al. (2007); Paris & Durand (2007), particularly medical and satellite images Lu et al. (2011); Ai & Xiong (2014); Wu & Luo (2015); Banerjee et al. (2012), videos Wang et al. (2004), and also applied to high dimensional data clustering Saptarshi et al. (2021). An adapted version of mean-shift clustering was applied to short segments speaker clustering Salmun et al. (2016b,a, 2017); Cohen & Lapidot (2021). This algorithm is deterministic and in an iterative procedure estimates the multi-modal probability density function (pdf) via the "climbing" path of each datum to its mode in a multi-modal distribution. All the data points that reached the same mode are grouped to the same cluster.
Use of a Multi-Layer Perceptron to Predict Malignancy in Ovarian Tumors
The objective of RL is to find -thanks to a reinforcement signal- an optimal strategy for solving a dynamical control problem. Here we sudy the continuous time, con(cid:173) tinuous state-space stochastic case, which covers a wide variety of control problems including target, viability, optimization problems (see [FS93], [KP95])}or which a formalism is the following.